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Low-Rank Signal Processing: Design, Algorithms 
for Dimensionality Reduction and Applications 


Rodrigo C. de Lamare 


Abstract 

We present a tutorial on reduced-rank signal processing, design methods and algorithms for dimensionality 
reduction, and cover a number of important applications. A general framework based on linear algebra and linear 
estimation is employed to introduce the reader to the fundamentals of reduced-rank signal processing and to describe 
how dimensionality reduction is performed on an observed discrete-time signal. A unified treatment of dimensionality 
reduction algorithms is presented with the aid of least squares optimization techniques, in which several techniques for 
designing the transformation matrix that performs dimensionality reduction are reviewed. Among the dimensionality 
reduction techniques are those based on the eigen-decomposition of the observed data vector covariance matrix, 
Krylov subspace methods, joint and iterative optimization (JIO) algorithms and JIO with simplified structures and 
switching (JIOS) techniques. A number of applications are then considered using a unified treatment, which includes 
wireless communications, sensor and array signal processing, and speech, audio, image and video processing. This 
tutorial concludes with a discussion of future research directions and emerging topics. 


I. Introduction 


R Educed-rank signal processing is an area of signal processing that is strategic for dealing with high¬ 
dimensional data, in low-sample support situations and large optimization problems that has gained con¬ 
siderable attention in the last decade m, m. The origins of reduced-rank signal processing lie in the problem of 
feature selection encountered in statistical signal processing, which refers to a dimensionality reduction process 
whereby a data space is transformed into a feature space jSl- The fundamental idea is to devise a transformation 
that performs dimensionality reduction so that the data vector can be represented by a reduced number of effective 
features and yet retain most of the intrinsic information content of the input data IS. The goal is to find the 
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best trade-off between model bias and variance in a cost-effective way, yielding a reconstruction error as small as 
desired. 

Dimensionality reduction is an emerging and strategic topic that promises great advances in the fields of statistical 
signal processing, linear algebra, communications, multimedia, artificial intelligence, optimization, control and 
physics due to its ability to deal with large systems and to offer high performance at low computational cost. 
Central to this idea is the existence of some form of redundancy in the signals being processed, which allows a 
designer to judiciously exploit it by selecting the key features of the signals. While the data in these applications may 
be represented in high dimensions due to the immense capacity for data retrieval of some systems, the important 
features are typically concentrated on lower dimensional subsets—manifolds—of the measurement space. This 
allows for significant dimension reduction with minor or no loss of information. In a number of applications, the 
dimensionality reduction may also lead to a performance improvement due to the denoising property - one retains 
the signal subspace and eliminates the noise subspace. Specifically, fhis redundancy can be fypically characterized by 
data that exhibits reduced-rank properties and sparse signals. In these situations, dimensionality reduction provides 
a means to increase the speed and the performance of signal processing tasks, reduce the requirements for storage 
and improve the tracking performance of dynamic signals. This is particularly relevant to problems which involve 
large systems, where the design and applicability of methods is constrained by factors such as complexity and 
power consumption. 

In general, the dimensionality reduction problem is associated with reduced-rank operators, characterized by a 
mapping performed by an M x T) transformation matrix S d with D < M that compresses the M x 1 observed data 
vector r into a D x 1 reduced-rank data vector r^). Mathematically, this relationship is given by rjy = S|^r, where 
(•)^ is the Hermitian operator. It is desirable to perform these operations so that the reconstruction error and the 
computational burden are minimal. The dimensionality reduction and the system performance are characterized by 
(a) accuracy, (b) compression ratio (CR), and (c) complexity. The main challenge is how to efficiently and optimally 
design Sd. After dimensionality reduction, a signal processing algorithm is used to perform the task desired by 
the designer. The resulting scheme with D elements can benefit from a reduced number of parameters, which may 
lead to lower complexity, smaller requirements for storage, faster convergence and better tracking capabilities of 
time-varying signals. 

In the literature, a number of dimensionality reduction techniques have been considered based on principal 
components (PC) analysis ||4l-|l6l, random projections Q, HI, diffusion maps incremental manifold learning 
ifTOll . clustering techniques ifTTI . ifT^ . Krylov subspace methods that include the multi-stage Wiener filter (MSWF) 
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ifTTll . |[T9l . EOl . II 2 TII and the auxiliary vector filtering (AVF) algorithm |[22l . Il23l . joint and iterative optimization 
(JIO) techniques |[25l . 1261, l27l . l2^ and JIO techniques with simplified sfrucfures and swifching mechanisms 
(JIOS) 123-El]. If is well known fhaf fhe optimal linear dimensionalify reducfion Iransformalion is based on fhe 
eigenvalue decomposition (EVD) of fhe known inpuf dafa covariance mafrix R and fhe selecfion of fhe PC. However, 
fhis covariance mafrix musf be esfimafed. The approach employed fo esfimafe R and perform dimensionalify 
reducfion is of cenfral imporfance and direcfly affecfs fhe performance of fhe mefhod. Some mefhods are plagued 
by numerical insfabilify, high compufafional complexify and large sensifivify fo fhe selecfed rank D. A common 
and fundamenfal limifafion of a number of exisfing mefhods is fhaf fhey rely on esfimafes of fhe covariance mafrix 
R of fhe dafa vector r fo design Sd, which requires a number of dafa vecfors proporfional fo fhe dimension M of 
R. 

The goal of fhis paper is to provide a fuforial on fhis imporfanf sef of mefhods and algorifhms, fo identify 
imporfanf applicafions of reduced-rank signal processing fechniques as well as new directions and key areas fhaf 
deserve furlher invesfigafion. The virfues and deficiencies of exisfing mefhods will be reviewed in fhis article, 
and a discussion of fulure frends in fhe area will be provided faking info accounf application requiremenfs such 
as fhe abilify fo frack dynamic signals, complexify and flexibilify. The paper is sfrucfured as follows. Secfion I 
infroduces fhe fundamenfals of reduced-rank signal processing, fhe signal model and fhe idea of dimensionalify 
reducfion using a fransformafion mafrix. Secfion II covers fhe design of fhe Iransformalion mafrix and a subsequenl 
parameter veclor using a leasl squares approach. Secfion III reviews several mefhods available in fhe liferafure for 
dimensionalify reduction and provides a discussion of fheir main advanlages and drawbacks. Section IV is devofed 
to fhe applications of fhese mefhods, whereas Secfion V draws fhe main conclusions and discusses fulure research 
directions. 


II. Fundamentals and signal model 

In fhis secfion, our goal is fo presenf fhe fundamenfal ideas of reduced-rank signal processing and how fhe 
dimensionalify reduction is performed. We will rely on an approach based on linear algebra to describe fhe signal 
processing of a basic linear signal model. This model is sufficienfly general fo accounf for numerous applications 
and lopics of inferesls. Lei us consider fhe following linear signal model al lime inslanl i fhaf comprises a sef of 
M samples organized in a veclor as given by 


r[z] = Hs[z] -f n[f], i = 1, 2,... , P 


(1) 
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where r[i] is the M x 1 ohserved signal vector which contains the samples to he processed, H is the M x M 
matrix that descrihes the mixing nature of the model, s[i] is the M x 1 signal vector that is generated hy a given 
source, n[i] is an M x 1 vector of noise samples, and P is the number of ohserved signal vector or simply the 
data record size. 



Fig. 1. Reduced-rank signal processing stages: dimensionality reduction and reduced-rank processing. 

In reduced-rank signal processing, the main idea is to process the ohserved signal r[z] in two stages, as illustrated 
in Fig. 1. The first stage corresponds to the dimensionality reduction, whereas the second corresponds to the 
signal processing in an often low-dimensional suhspace. The dimensionality reduction is performed hy a mapping 
represented hy a transformation matrix S d with dimensions M x D, where D < M, that projects the ohserved data 
vector r with dimension M x 1 onto a. D x I reduced-dimension data vector r^. This relationship is expressed hy 

rD[i\ = Sgr[z] = Sg(Hs[i] +n[f]), (2) 

Key design criteria for the transformation S^i and the dimensionality reduction are the reconstruction error, the 
computational complexity and the compression ratio CR = M/D. These parameters usually depend on the 
application and the design requirements. 

After the dimensionality reduction, an algorithm is used to perform the signal processing task on the reduced- 
dimension ohserved vector r£)[i] according to the designer’s aims. The resulting scheme with D elements will 
hopefully henefit from a reduced number of parameters, which may lead to lower complexity, smaller requirements 
for storage, faster convergence and superior tracking capability. In the case of a combination of weights (filtering) 
by a parameter vector with D coefficients w/j = [tr!i W 2 ... wdY, we have the following output 

x[i] = wgrz)[i] = wgs|^r[i]. (3) 

It is expected that the output of the reduced-rank signal processing system will yield a small reconstruction error 
as compared to the full-rank system, and provide extra benefits such as speed of computation and a reduced set of 
features for extraction from r£)[i]. 
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III. Linear MMSE Design of Reduced-Rank Techniques 

In this section, we will consider a framework for reduced-rank techniques based on the linear minimum mean- 
square error (MMSE) design. The basic idea is to find a reduced-rank model that can represent the original full-rank 
model by extracting its key features. The main goal is to present the design of the main components employed 
for reduced-rank processing and examine the model order selection using a simple and yet general approach. Eet 
us consider the M x 1 observed signal vector r[z] in O. Eor the sake of simplicity and for general illustrative 
purposes, we are interested in designing reduced-rank techniques with the aid of linear MMSE design techniques. 
In order to process the data vector r[i] with reduced-rank techniques, we need to solve the following optimization 
problem 

[SD,opt, W£,,opt] = arg min £’[|d[i] - w|^S|^r[i] p], ... 

x{i] 

where d[i] is the desired signal and [ • ] stands for the expected value operator. 

The optimal solution W£) opt of the optimization problem in (lUl is obtained by fixing S/j, taking the gradient 
terms of the argument with respect to and equating them to a null vector |[T1, ||38l, which yields 


WD,opt = R-^P = (SgRSz)) ^sgp, 


(5) 


where R = £'[f[z]f^[z]] = S^^RS^i is the D x D reduced-rank correlation matrix, p = £^[(i*[z]f [f]] = S|^p is the 
D X 1 cross-correlation vector of the reduced-rank model. The associated MMSE for a rank-D parameter vector is 
expressed by 


MMSE = aj- p^R’^p 

= a^-p^Sr^(SgRSz))-'sgp, 


( 6 ) 


where = E'[|(i[i]p]. 

The optimal solution S/j^opt of the optimization problem in (lUl is obtained by fixing W£)[z], taking the gradient 
terms of the the associated MMSE in ® with respect to and equating them to a zero matrix. By considering 
the eigen-decomposition of R = where $ is an M x M unitary matrix with the eigenvectors of R and 

A is an M X M diagonal matrix with the eigenvalues of R in decreasing order, we have 


Su.opt = (7) 

where is a M x D unitary matrix that corresponds to the signal subspace and contains the D eigenvectors 

associated with the D largest eigenvalues of the unitary matrix In our notation, the subscript represents the 
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number of components in each dimension. For example, the M x D matrix contains the D first colurrms 

of where each column has M elements. 

If we substitute the expression of the optimal transformation SD,opt = and use the fact that 

R = 

'-V-' '-V-' 

signal subspace noise subspace 

in the expression for the optimal reduced-rank parameter vector in ([5]), we have 

wz,,opt = (SgRSz5)-'sgp = 

The development above shows us that the key aspect for constructing reduced-rank techniques is the design of Si:i 
since the MMSE in ® depends on p, R and S^. The quantities p and R are common to both reduced-rank and 
full-rank designs, however, the matrix Sd plays a key role in the dimensionality reduction and in the performance. 
The strategy is to find fhe most appropriate trade-off between the model bias and variance ||3 by adjusting the rank 

D. 

Our exposition assumes so far a reduced-rank linear model with model order D that is able to represent a 
full-rank model with dimension M. However, it is well known that the performance, compression ratio (CR) and 
complexity of such a procedure depends on the model order D. In order to address this problem, a number of 
techniques have been reported in the literature which include the Akaike’s information-theoretic (AIC) criterion 
|[32ll . the minimum description length criterion |[3^ and a number of other techniques |[34l . The basic idea of 
these methods is to determine the model order D for which a given criterion is optimized. This can be cast as the 
following optimization problem 

Dopt = argrrun/(£>), (9) 

where f{D) is the objective function that depends on the model order and consists of a suitable criterion for the 
problem. This criterion can be either of an information-theoretic nature |[34l . related to the error of the model ll20l . 
|[29l . associated with metrics or projections computed from the basis vectors of |[T^ or based on cross-validation 
techniques 1231. 


A 




^D+l-.M,D+l:M 


^ 1 : 


M,D+l-.M\ 




IV. Algorithms for Dimensionality Reduction 

For the sake of simplicity and for general illustrative purposes, we will consider in this section algorithms for 
dimensionality reduction based on least squares (LS) optimization techniques with a fixed model order D. In order 
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to process the data vector r[i] with reduced-rank techniques, we need to solve the following optimization problem 


[Sd [*]>!)[*]] =arg min V A* ‘\d[l] - wj^[{\S^[i\r[l] 

SdW,wdW^ '-' 

x[i\ 


( 10 ) 


C(Si3[i],WD[i]) 

where d[l] is the desired signal and 0 < A < 1 stands for the forgetting factor that is useful for time-varying 
scenarios. 

The optimal solution W£)[i] of the LS optimization in (fTOl) is obtained by fixing taking the gradient terms 

of the argument with respect to w|)[i] and equating them to a null vector, which yields 


- -1 . 
wo[f]=R 


= ^Sg[z]p[z], 


( 11 ) 


where R[i] = X]/=i A*“^f = S|^[i]R[i]Sr:)[i] is the reduced-rank correlation matrix that is an estimate of the 

covariance matrix R and p[i] = A*“^(i*[(]f [/] = S|^[i]p[i] is the cross-correlation vector of the reduced-rank 

model that is an estimate of p at time i. The associated sum of error squares (SES) HI for a rank-Z) parameter 
vector is obtained by substituting (fTTI) into the cost function C(SdH, W£)[i]) in (fTOb and is expressed by 


SES = -p^[z]R [i]p[f] 

= ^d[^] - P^MSz)[i](Sg[i]R[i]SLi[f])"^Sg[i]p[i], 


( 12 ) 


where ^^~^\d{l)\^. The development above shows us that the optimal filter W£)[f] and the SES in 

(fT^ depend on p[f], R[i] and Sz)[f]. The quantities p[f] and R[i] are common to both reduced-rank and full-rank 
designs, however, the matrix SdI*] and the algorithm used for its design play a key role in dimensionality reduction, 
the performance and complexity. 

A number of algorithms for computing the transformation S d [*] that is responsible for dimensionality reduction 
in our model have been reported in the literature. In this work, we will categorized them into eigen-decomposition 
techniques, Krylov-based methods, joint iterative optimization (JIO) techniques and techniques based on the JIO 
with simplified strucfures and swifching (JIOS). An interesting characteristic that has not been fully explored in 
the literature so far is the fact that algorithms for dimensionality reduction can be devised by looking at different 
stages of the signal processing. Specifically, one can devise algorithms for dimensionality reduction directly from a 
cost function associated with the original optimization problem in (ITOl) or by considering the SES in (fT^ . In what 
follows, we will explore the fundamental ideas behind these methods and their main features. 
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A. Eigen-decomposition techniques 

In this part, we review the basic principles of eigen-decomposition techniques for dimensionality reduction. 
Specifically, we focus on algorithms based on the principal component (PC) and the cross-spectral (CS) approaches. 
Our aim is to review the main features, advantages and disadvantages of these algorithms. The PC was the first 
statistical signal processing method employed for dimensionality reduction and was introduced by Hotelling in 
the 1930’s a. The PC method chooses the subspace spanned by the eigenvectors corresponding to the largest 
eigenvalues, which contain the largest portion of the signal energy. Nevertheless, in the case of a mixture of 
signals, the PC approach does not distinguish between the signal of interest and the interference signal. Hence, the 
performance of the algorithm degrades significantly in interference dominated scenarios. This drawback of the PC 
algorithm motivated the development of another technique, the CS algorithm ifT^ which selects the eigenvectors 
such that the MSE over all eigen-based methods with the same rank is minimal. To do so, it considers additionally 
the cross-covariance between the observation and the unknown signal, thus, being more robust against strong 
interference. A common disadvantage of PC and CS algorithms is the need for eigen-decompositions, which are 
computationally very demanding when the dimensions are large and have typically a cubic cost with M (0(M)^). 
In order to address this limitation, numerous subspace tracking algorithms have been developed in the last two 
decades, which can reduce the cost to a quadratic rule with M lITSl . 

We can illustrate the principle of PC by relying on the framework employed in this section. The optimal solution 
SD,opt of the least squares optimization in (fTOl) is obtained by fixing faking the gradient terms of the 

the associated SES in (fT^ with respect to and equating them to a zero matrix. By considering the eigen- 

decomposition of R[i] = where $[i] is an M x M unitary matrix with the eigenvectors of R[f] 

and A[z] is an M x M diagonal matrix with the eigenvalues of R[f] in decreasing order, we have 

S_d[*] = (13) 

where ^v.Mp-.D[i] represents the D eigenvectors associated with the D largest eigenvalues of $[i]. The adjustment 
of the model order D of this method as well as the algorithms described in what follows can be performed by a 
model order selection algorithm 1341. 

B. Krylov subspace techniques 

The first Krylov methods, namely, the conjugate gradient (CG) method |[T3l and the Eanczos algorithm 1141 have 
been originally proposed for solving large systems of linear equations. These algorithms used in numerical linear 


algebra are mathematically identical to each other and have been derived for Hermitian and positive definite system 
matrices. Other techniques have been reported for solving these problems and the Arnold! algorithm ifTSl is a 
computationally efficient procedure for arbitrarily invertible system matrices. The multistage Wiener filter (MSWF) 
ifTTl and the auxiliary vector filtering (AVF) |[22l algorithms are based on a multistage decomposition of the linear 
MMSE estimator. A key feature of these methods is that they do not require an eigen-decomposition and have a 
very good performance. It turns out that Krylov subspace algorithms that are used for solving very large and sparse 
systems of linear equations, are suitable alternatives for performing dimensionality reduction. The basic idea of 
Krylov subspace algorithms is to construct the transformation matrix S£)[f] with the following structure: 

SdW = [q[f] RWq[f] ••• R'^"Hf]q[f]], (14) 

where q[i] = and || • || denotes the Euclidean norm (or the 2-norm) of a vector. In order to compute the 

basis vectors of the Krylov subspace (the vectors of SD[f]), a designer can either directly employ the expression 
in (fT4l) or resort to more sophisticated approaches such as the Arnold! iteration ifTSl . An appealing feature of the 
Krylov subspace algorithms is that the required model order D does not scale with the system size. Indeed, when 
M goes to infinity the required D remains a finite and relatively small value. This result was established by Xiao 
and Honig [351. Among the disadvantages of Krylov subspace methods are the relatively high computational cost 
of constructing Sr:)[i] (0{DM^)), the numerical instability of some implementations and the lack of flexibility for 
imposing constraints on the design of the basis vectors. 

C. Joint iterative optimization techniques 

The aim of this part is to introduce the reader to dimensionality reduction algorithms based on joint iterative 
optimization (JIO) techniques. The idea of these methods is to design the main components of the reduced-rank 
signal processing system via a general optimization approach. The basic ideas of JIO techniques have been reported 
in ll24]| . ||25]| . Il26l . Amongst the advantages of JIO techniques are the flexibility to choose the optimisation algorithm 
and to impose constraints, which provides a significant advantage over eigen-based and Krylov subspace methods. 
One disadvantage that is shared amongst the JIO techniques, eigen-based and Krylov subspace methods are the 
complexity associated with the design of the matrix SD[f]- Eor instance, if we are to design a dimensionality 
reduction algorithm with a very large M, we still have the problem of having to design an M x ZJ matrix S/j [i]. 

In the framework of JIO techniques, the design of the matrix S£)[f] and the parameter vector W£)[f] for a 
fixed model order D will be entirely dictated by the optimization problem. To this end, we will focus on a generic 
SdI*] = [si[f] S 2 [i] • • • S£)[i]], in which the basis vectors s^li], d = 1,2,... , D will be obtained via an optimization 
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algorithm and iterations between the and W£)[i] will be performed. The JIO method consists of solving the 
following optimization problem 


D 


[si[i], S2[i], SD[i],WD[i]] =arg , ^ 

Si[t\ S2M ... SI>[^J,WD[^. 




1=1 


d=l 


(15) 


C(si[i], 82(1], Sj3[i],wj3[i]) 


where corresponds to a vector with a one in the dth positions and zeros elsewhere. It should be remarked 
that the optimization problem in (fTSl) is non convex, however, the algorithms do not present convergence problems. 
Numerical studies with JIO methods indicate that the minima are identical and global. Proofs of global convergence 
have been established with different versions of JIO schemes ll24l . ||26l, which demonstrate that the LS algorithm 
converges to the reduced-rank Wiener filter. 



Fig. 2. JIO scheme: dimensionality reduction and reduced-rank processing with T iterations. 


The solution to the problem in (fTSl) for a particular basis vector Srf[i] for d = 1,2,... ,iA can be obtained by 
fixing the remaining basis vectors St[i] foit^d and W£)[z], computing the gradient terms of the cost function 
C(si[f], S 2 [f], ..., S£)[i], W£)[i]) defined in (fTSl) with respect to Srf[z] and equating the terms to a null vector. The 
solution to (fTSl) for w d [f] can be obtained in a similar way. We fix all the basis vectors [i], compute the gradient 
terms of C(si[f], S 2 [f], ..., sr:)[f],W£)[z]) with respect to and equate the terms to a null vector. Moreover, 

we also allow the basis vectors s^li] that are the columns of Soli] and the parameter vector w/)[i] to exchange 
information between each other via t = 1,2,... ,T iterations, as illustrated in Fig. 2. This results in the following 
JIO algorithm 



1 



R '[t](p|i] 



W). 


for d = l,2,...,D, t = l,2,...,T, 


(16) 
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= (s?’ ^*^[i]p[i], for t = l,2,...,r, (17) 

where = Yl\=i^''~^Tc[l]w^^\i]Yl,n=in^d^^\^\^^n[i]wn\i] is an M x 1 vector with cross-correlations. The 

recursions in (fT^ and ([TV] ) are updated over time and iterated T times for each instant i until convergence to 
a desired solution. In practice, the iterations can improve the convergence performance of the algorithms and it 
suffices to use T = 2,3 iterations. In terms of complexity, the JIO techniques have a computational cost that is 
related to the optimization algorithm. With recursive LS algorithms the complexity is quadratic with M ((0(M^)), 
whereas the complexity can he as low as linear with DM when stochastic gradient algorithms are adopted 1291 . 

D. Joint iterative optimization techniques with simplified structures and switching 

In this subsection, we introduce the reader to dimensionality reduction algorithms based on JIO techniques with 
simplified sfrucfures aided by fhe concepf of swifching (JIOS). One disadvanfage fhaf is shared amongsf fhe JIO 
fechniques, eigen-based and Krylov subspace mefhods is fhe complexity associated with the design of the matrix 
S£) [f]. For instance, if we are to design a dimensionality reduction algorithm with a very large M, we still have the 
problem of having to design an M x D matrix S d [*] with the computational costs associated with M D complex 
coefficients. An approach to circumventing this is based on the design of Sz)[i] with simplified strucfures, which 
can be done in a number of ways. For example, a designer can employ random projections Q or impose design 
consfrainfs on Soli] such fhaf fhe number of compufafions can be significanlly reduced. The main drawback of 
fhese simplified sfrucfures is fhe associafed performance loss evidenced by fhe large reconsfrucfion error, which is 
fypically larger fhan fhaf obfained wifh more complex dimensionalify reducfion algorifhms. In order fo address fhis 
issue, fhe JIOS framework incorporafes mulfiple simplified strucfures fhaf are selecfed according fo a swifching 
mechanism, aiming af minimizing fhe reconsfrucfion error. Swifching fechniques play a fundamenlal role in diversify 
sysfems employed in wireless communications systems and confrol systems ||36l. They can increase fhe accuracy 
of a parficular procedure by allowing a signal processing algorifhm fo choose befween a number of signals or 
estimates. 

The basic idea of fhe JIOS-based mefhods is to address the problem of reconstruction error associated with 
the design of a transformation To this end, the strategy is to employ multiple transformation matrices in 

order to obtain smaller reconstruction errors by seeking the best available transformation. In order to illustrate the 
JIOS framework, we consider the block diagram in Fig. |3]in which multiple transformation matrices in for 

b = 1,2,... ,B are employed and iterations can be performed between S£)fe[f] and the parameter vector 
In addition to this, another goal is to simplify the structure of S/j ^[i] by imposing design constraints, which can 
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correspond to having very few non zero coefficients or even having deterministic patterns for each branch b. 



Fig. 3. JIOS scheme: dimensionality reduction aided by switching. 


One example of the JIOS framework, is the joint and iterative interpolation, decimation and filtering (JIDF) 
scheme recently reported in |[29l . Let us now review the JIDF scheme and describe it using the JIOS framework. 
In the JIDF scheme, the basic idea is to employ an interpolator v[i] with I coefficients, a decimation unit and a 
reduced-rank parameter vector W£)[i] with D coefficients. A key strategy for the design of parameters of the JIDF 
scheme is to express the output x[i] = f,[i]r[z] as a function of v[f], the decimation matrix Db[ii] and 

WB[i] as follows: 


x[i] = wg[i]S|^_Ji]r[z] = wg[i]Db[i]Ko[i]v*[i] 

M 

= wg[i](D6[i] ^ B™v*[i])r[i] = v^[i]u[i]> 

?n=l 

-V-' 

sg,.W 


(18) 


where u[z] = [i]w*[i] is an / X 1 vector, the M x I matrix Bm has an /-dimensional identity matrix 

starting at the m-th row, is shifted down by one position for each m and the remaining elements are zeros, and the 
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M X I matrix 8 fio[i] with the samples of r[i] has a Hankel structure descrihed hy 



[»] 

h] 

[*] 


^0 

r\ . 

■ n-l 


[*] 

[*] 

[*] 

= 

r j 

^2 ■ 

ry 






' M-l 

' M ■ 

■ ' M+I-2 


The expression in (fT^ indicates that the dimensionality reduction carried out hy the proposed scheme depends on 
finding appropriate v[i], Db[i] for constructing 

The design of the decimation matrix Dfe[i] employs for each row the structure: 





7j zeros 


0 


0 


(M—7j —1) zeros 


( 20 ) 


and the index j (j = 1,2) ■ • ■, D) denotes the j-th row of the matrix, the rank of the matrix Db[i] is D = M/L, 
the decimation factor is L and B corresponds to the number of parallel hranches. The quantity 7 ^ is the number 
of zeros chosen according to a given design criterion.Given the constrained structure of Di,[i], it is possible to 
devise an optimal procedure for designing Di,[i] via an exhaustive search of all possible design patterns with the 
adjustment of the variable 7 j. The exhaustive procedure has a total number of patterns equal to i?ex = (^)- The 
exhaustive scheme is too complex for practical use and it is fundamental to devise decimation schemes that are 
cost-effective. By adjusting the variable 7 ^, a designer can obtain various sub-optimal schemes including pre-stored 
(e.g. 7 j = {j — 1)L + {b — 1)) and random patterns. 

The decimation matrix Dfe[i] is selected to minimize the square of the instantaneous error signal obtained for 
the B branches employed as follows 


Db[i] = Dfeji] when 6 s = arg min \eb[i]\" 

l<b<B 


( 21 ) 


where eb[i] = d[i] — w|^[i]S^ j[i]r[i]. The D x 1 vector r£)[i] is computed by 


roli] = = Db[i] 3 f?o[i]v*[i], 

The design of S^) and W£)[z] corresponds to solving the following optimization problem 

i 

[SD[i], wd[*]] = arg min - wg[i]D6[/]9?o[f]v*[i]|^, 

-irl/Vl TirI'll ' * 


( 22 ) 


(23) 


V[2j,Wi5[2j 


1 = 1 


It should be remarked that the optimization problem in (|2^ is non convex, however, the methods do not present 
problems with local minima. Proofs of convergence are difficult due to the switching mechanism and constitute an 
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interesting open problem. Using a similar approach to the JIO technique, we obtain the following recursions for 
computing v[i] and W£)[i]: 


rW[i]=R-i[i]pW[i], f = 




= RBi[z]pg[i], f = l,2,...,r 


(24) 

(25) 


where pW[i] = ZU A'-'d*[(]u(*)[(], R,[i] = ZU A*-'uW[(]u^’ W[/], pW[i] = ZUi A*-'d*[(]rg [(] andRg[i] = 

Iri terms of complexity, the computational cost of the JIDF scheme scales linearly with 
DIM and quadratically with I and D. Since I and D are typically very small (3-5 coefficients), this makes the 
JIDF scheme a low-complexity alternative as compared with PC, Krylov and JIO schemes. 


E. Summary of Dimensionality Reduction Algorithms 

In this part, we summarize the most representative LS-based algorithms for dimensionality reduction presented 
in the previous subsections and provide a table that explain how to simulate these algorithms. Specifically, we 
consider the PC method HI, the MSWF |[T^ . the JIO |[2^ and the JIDF techniques |[29l . 

V. Applications: 

In this section, we will describe a number of applications for reduced-rank signal processing and dimensionality 
reduction algorithms and link them with new research directions and emerging fields. Among the key areas for these 
methods are wireless communications, sensor and array signal processing, speech and audio processing, image and 
video processing. A key aspect that will considered is the need for dimensionality reduction and typical values for 
the dimension M of the observed vector, the model order D and the compression ratio CR = M/D. 

A. Wireless communications: 

In wireless communications, a designer must deal with stringent requirements in terms of quality of service, an 
increasingly demand for higher data rates and scenarios with time-varying channels. At the heart of the problems 
in wireless communications lie the need for designing transmitters and receivers, algorithms for data detection and 
channel and parameter estimation. These problems are ubiquitous and common to spread spectrum, multi-carrier 
and multiple antenna systems. Specifically, when the number of parameters grows beyond a certain level (which is 
often the case), the level of interference is high and the channel is time-varying, reduced-rank signal processing and 
algorithms for dimensionality reduction can play a decisive role in the design of wireless communication systems. 
For instance, in spread spectrum systems we often encounter receiver design problems that require the computation 


TABLE I 

Dimensionality reduction algorithms. 
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PC : 

Step I: Initialization; 

Set values for A, the model order D and R[0] 

Step 2: For i= 1, 2, ..., P. 

(1) Compute R[i] = AR[i — 1] + r[i]r^[i], 

(2) Compute p[i] = Ap[i — 1] + p[i]d*[i], 

(3) Perform an eigen-decomposition of R[i] = , 

(4) Compute Si 3 [i] = ^i;m, 1 :d[*], 

(5) Calculate w_D[i] = (S|^[i]R[i]S_D[i]) ^S|^[i]p[i]. 

MSWF : 

Step 1: Initialization: 

Set values for A, the model order D and R[0] 

Step 2: For i= 1,2, ..., P. 

(1) Compute R[i] = AR[i — 1] + r[i]r^[i], 

(2) Compute p[i] = Ap[i — 1] + r[i]d*[i], 

(3) Construct S_d[*] = [q[*] R[i]q[i] ... R‘°“^[i]q[i]], 
where q[i] = 

(4) Calculate w_D[i] = (S2[i]R[i]SD[i]) ^S2[i]p[i]. 

JIO : 

Step 1: Initialization; 

Set values for A, the model order D and R[0] 

Step 2: For i= 1, 2, ..., P. 

(1) Compute R[i] = AR[i — 1] -f r[i]r‘^[i], 

(2) Obtain p[i] = Ap[i — 1] + r[i]d*[i], 

(3) For t — 1,2,... ,T and d = 1, 2,..., O 

(4) Compute = ’_R-i[i](p[i] - vW[i]), 

(5) Compute = (S"’ [i]R[i]S^^[i])'*^Wp[i], 

where [*] E^=i,n^d 

JIO : 

Step 1: Initialization; 

Set values for A, the model orders D and I, Ru[0] and Rd[0] 

Step 2: For i = 1,2,..., P and t — 1,2,.. .T. 

(1) Determine D6[i] = Dt,Ji] when fes = argmini<6<fl |e6[i]p, 

(2) Calculate and r^^[i] = ■^[i]r[i], 

(3) Compute Rti[i] = AR„[i — 1] + ^[i] and Pu^[i] = Apl*^[i — 1] + d*[i]u^*^[i], 

(4) Compute RD[i] = ARofi - 1] + ^[*] and Pd^[*] = “ 1] + d* [i], 

(5) Obtain = R;7^[i]p® [i], 

(6) Construct ^[i]D[i] Em=i BmV*’ 

(7) Obtain w®[i] = R~^ [i]p^'[i]. 


of a few hundred parameters, i.e., M = 100, 200,... and we typically wish to perform a dimensionality reduction 
which leads to a model order of a few elements, i.e., D = 4, 5 and which yields a CR > 25. In a multi-antenna 
system in the presence of flat fading, the number of parameters is typically small (2,4) for spatial processing and 
corresponds to the number of antennas used at the transmitter and at the receiver. However, if we consider multi¬ 
antenna systems in the presence of frequency selective channels, the numbers can increase to dozens of coefficients. 
In addition, the combination of multi-antenna systems with multi-carrier transmissions (eg. MIMO-OFDM) in 
the presence of time-selective channels, can require the equalization of structures with hundreds of coefficients. 
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Therefore, the use of reduced-rank signal processing techniques and dimensionality reduction algorithms can he of 
fundamental importance in the problems previously descrihed. In what follows, we will illustrate with a numerical 
example the design of a space-time reduced-rank linear receiver for interference suppression in a spread spectrum 
system equipped with antenna arrays. 

Numerical Example'. Space-Time Interference Suppression for Spread Spectrum Systems 

We consider the uplink of a direct-sequence code-division multiple access (DS-CDMA) system with symbol 
interval T, chip period Tc, spreading gain N = T/Tc, K users, multipath channels with the maximum number of 
propagation paths L, where L < N. The system is equipped with an antenna that consists of a uniform linear array 
(ULA) and J sensor elements. The spacing between the ULA elements is d = Ac/2, where Ac is carrier wavelength. 
We assume that the channel is constant during each symbol, the base station receiver is perfectly synchronized and 
the delays of the propagation paths are multiples of the chip rate. The received signal after filtering by a chip-pulse 
matched filter and sampled at the chip period yields the M x 1 received vector at time i 
r[f] = H[i]s[z] -f n[z] 

K (26) 

= ^ - l]pfc[f] -h AkSk[i]pk[i] + AkSk[i + l]pfe[f] + 'n[i\ + n[z], 

k=l 

where M = J{N + L — 1), denotes the data symbol of user k at time i, Ak is the amplitude of user k, the 
complex Gaussian noise vector is n[z] = [ni[i] ... nM[*]]^ with = <7^1, (•)^, ri[i] corresponds to 

the intersymbol interference and (•)^ denote transpose and Hermitian transpose, respectively, and £^[-] stands for 
expected value. The spatial signatures for previous, current and future data symbols are pfc[z —1], Pfc[i], Pfc[z-|-1] and 
are constructed with the stacking of the convolution between the signature sequence = [afc(l)... afc(A^)]^ of user 
k and the L x 1 channel vector j[i] = • • • h^k\_i[i]]'^ of user k at each antenna element j = 1, 2, ... , J. 

A linear reduced-rank space-time receiver for a desired user k can be designed by linearly combining the received 
vector r[f] with the transformation Sz)[z] and the reduced-rank parameter vector W£)[f] as expressed by 

aJfcW = wg[i]Sg[f]r[f], (27) 

where the data detection corresponds to applying the signal Xk[i] to a decision device as given by .Sfc[i] = Q{xk[i]), 
where Q{-) is the function that implements the decision device and depends on the modulation. 

Let us now consider simulation examples of the space-time reduced-rank receiver described above in which 
the dimensionality reduction techniques using LS optimization are compared. In all simulations, we use the initial 
values wd[ 0] = [1 0 ... 0]^ and Sd[0] = [Id Od,m-d]^. A = 0.998, employ randomly generated spreading 
codes with = 16, assume L = 9 as an upper bound, use 3-path channels with relative powers given by 0, —3 
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and — 6 dB, where in each run the spacing between paths is obtained from a discrete uniform random variable 
between 1 and 2 chips and average the experiments over 1000 runs. The system has a power distribution among 
the users for each run that follows a log-normal distribution with associated standard deviation equal to 1.5 dB. 
We consider antenna arrays equipped with J = 2 and J = 4 elements. The dimension M of the observed signal 
in these examples corresponds to M = J{N + L — 1), which corresponds to M = 48 (for J = 2) and M = 96 
(for J = 4). These figures leads to a CR > 20 (for J = 4). We compare the full-rank, the PC method Q, the 
MSWF ifTTl Krylov subspace technique, the JIO technique |[2^ and the JIOS (JIDF) scheme with an optimized 
model order D for each method. We also include the linear full-rank MMSE receiver that assumes the knowledge 
of the channels and the noise variance at the receiver. Each reduced-rank scheme provides an estimate of the desired 
symbol for the desired used (user 1 in all experiments) and we assess the bit error rate (BER) against the number of 
symbols. We transmit packet with P = 1500 QPSK symbols in which 250 are used for training. After the training 
phase the space-time receivers are switched to decision-directed mode. In Pig. IH we show an example of the BER 
performance versus the number of symbols, whereas in Pig. [5] we consider examples with the BER performance 
versus the SNR and the number of users. 



Fig. 4. BER performance versus number of received symbols. 


The curves depicted in Pigs. 0] and [5] show that the dimensionality reduction applied to the received signals 
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combined with the reduced-rank processing can accelerate significantly the convergence of the adaptive receivers. 
The best results are obtained by the JIDF and JIO methods, which approach the linear MMSE receiver and are 
followed by the MSWF, the PC-based receiver and the full-rank technique. The algorithms analyzed show very 
good performance for different values of SNR and number of users in the system. A key feature to be remarked 
is the ability of the subspace-based algorithms to converge faster and to obtain good results in short data records, 
reducing the requirements for training. 



Fig. 5. BER performance against (a) Eh/No (dB) and (b) Number of Users (K) . 


B. Sensor and array signal processing: 

The basic aim of sensor and array signal processing is to consider temporal and spatial information, captured 
by sampling a wave field wifh a sef of appropriafely placed antenna elemenfs or sensor devices. These devices 
are organized in pafferns or arrays which are used fo defecf signals and fo determine information abouf fhem. The 
wavefield is assumed fo be generated by a finite number of emitters, and confains information abouf signal paramefers 
characterizing fhe emitters. A number of applications for sensor and array signal processing have emerged in fhe 
lasf decades and include acfive noise and vibration confrol, beamforming, direcfion finding, harmonic refrieval, 
disfribufed processing for nefworks, radar and sonar systems. In fhese applications, when fhe number of parameters 
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grows beyond a certain level the signal-to-noise ratio is low and the level of interference is high, reduced-rank 
signal processing and algorithms for dimensionality reduction can offer an improved performance as compared 
with conventional full-rank techniques. For example, in broadband beamforming or space-time adaptive processing 
applications for radar we may have to deal with an optimization problem that requires the processing of observed 
signals with dozens to hundreds of coefficients, i.e., 50 < M < 200. In this case, a dimensionality reduction 
which leads to a model order with only a few elements, i.e., Z) = 4, 5 and which has a CR > 10 can facilitate 
the design and be highly beneficial to the performance of the system. A number of other problems in sensor 
and array signal processing can be cost-effectively addressed with reduced-rank signal processing techniques and 
dimensionality reduction algorithms. In what follows, we will illustrate with a numerical example the design of 
adaptive beamforming techniques. 

Numerical Example-. Adaptive reduced-rank beamforming 

Let us consider a smart antenna system equipped with a uniform linear array (ULA) of M elements. Assuming 
that the sources are in the far field of the array, the signals of K narrowband sources impinge on the array {K < M) 
with unknown directions of arrival (DOA) 9i for I = 1,2,..., AT. The input data from the antenna array can be 
organized in an M x 1 observed vector expressed by 


r[i] = Hs[i] -h n[i] 

= A(0)s[f] -h n[f] 


(28) 


where A{6) = [a(0i),..., a(0A')] is the M x K matrix of signal steering vectors. The M x 1 signal steering vector 
is defined as 


a(0z) 




27r_7(M—1)^ cos 0 i 


(29) 


for a signal impinging at angle 6i, I = 1,2,..., AT, where dg = is the inter-element spacing, Ac is the 

wavelength and (.)^ denotes the transpose operation. The vector n[f] denotes the complex vector of sensor noise, 
which is assumed to be zero-mean and Gaussian with covariance matrix u^I. 

Let us now consider the design of an adaptive reduced-rank minimum variance distortionless response (MVDR) 
beamformer. The design problem consists of computing the transformation S£)[i] and w[i] via the following 
optimization problem 

[Sd[*], W£,[i]] = arg min ’w|^[i]Sg[i]i?[i]Sz)[i]wD[i] 

Sd W>w_d[i] 

subject to ’w|^[i]S|^[i]a(0fc) = 1 


(30) 
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In order to solve the above problem, we can resort to algorithms outlined in Section IV. The main difference is that 
now we need to minimize the mean-square value of the output of the array and enforce the constraint that ensures 
the response of the reduced-rank beamforming algorithm to be equal to unity. 


N=24, K=6 users (1< i < 800), K=8 users (801< i < 1600) SNR=12 dB 
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Fig. 6. SINR performance versus number of snapshots. 


We consider an example of adaptive beamforming in a non-stationary scenario where the system has 6 users 
with equal power and the environment experiences a sudden change at time i = 800. We assess the performance 
of the system in terms of the signal-to-interference-plus-noise ratio (SINR), which is defined as 

[^] W [^] Si) [i] WD [i] 


SINR = 


(31) 


wg [i] Sg [{] Hi [{] Sd [i] wd [i] 

where Rg denotes the covariance matrix of the signal of interest (Sol) and R/ is the covariance matrix of the 
interference. In the example, we consider the full-rank, the PC ||5l, the MSWF |[T^ . the AVF 1231, the JIO l2^ . 
the JIDF l29l and the MVDR that assumes perfect knowledge of the covariance matrix. The 5 interferers impinge 
on the ULA at —60°, —30°, 0°, 45°, 60° with equal powers to the Sol, which impinges on the array at 15°. At the 
time instant i = 800 we have 3 interferers with 5 dB above the Sol’s power level entering the system with DoAs 
—45°, —15° and 30°, whereas one interferer with DoA 45° and a power level equal to the Sol exits the system. The 
results of this example are depicted in Fig. 0 The curves show that the reduced-rank algorithms have a superior 













































21 


performance to the full-rank algorithm. The best performance is ohtained hy the JIDF scheme, which is followed 
hy the JIO, the AVF, the MSWF, the PC and the full-rank algorithms. 

C. Audio, speech, image, and video processing: 

Echo cancellation, prediction, compression and recognition of multimedia signals. 

D. Modelling for non-linear and large problems: 

Neural networks and other hio-inspired structures, Volterra series. 

VI. New frontiers and research directions: 

We will consider the relationships between reduced-rank signal processing and algorithms for dimensionality 
reduction with emerging techniques that include compressive sensing and tensor decompositions. 

Compressive sensing techniques Il4ll - B3l can substantially improve the performance of sensor array processing 
systems including MIMO radars by taking into account and exploiting the sparse nature of the signals encountered 
in these systems. In the literature of signal processing and information theory it has been recently shown that the use 
of compressive sensing techniques Bdl - HSl can provide very significant gains in performance while requiring lower 
computational complexity requirements than existing techniques due to a smarter way of processing information. 
The main idea behind compressive sensing methods is to use linear projections that extract the key information 
from signals and then employ a reconstruction algorithm based on optimization techniques. In particular, the linear 
projections are intended to collect the samples that are meaningful for the rest of the procedure and perform 
significant signal compression. These linear projections are essentially dimensionality reduction procedures. Samples 
with very small magnitude that cannot be discerned from noise are typically good candidates for elimination. This 
is followed by a reconstruction algorithm that aims to recreate the original signal from the compressed version. 

VII. Concluding Remarks 

In this tutorial on reduced-rank signal processing, we reviewed design methods and algorithms for dimensionality 
reduction, and discussed a number of important applications. A general framework based on linear algebra and 
linear estimation was employed to introduce the reader to the fundamentals of reduced-rank signal processing and 
to describe how dimensionality reduction is performed on an observed discrete-time signal. A unified treatment of 
dimensionality reduction algorithms was presented and used to describe some key algorithms for dimensionality 
reduction and reduced-rank processing. A number of applications were considered as well as several examples were 
provided to illustrate this important area. 
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